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Abstract 

A family of codes with a natural two-dimensional structure is presented, inspired by 
an application of RAID type of architectures whose units are solid state drives (SSDs). 
Arrays of SSDs behave differently to arrays of hard disk drives (HDDs), since hard 
errors in sectors are common and traditional RAID approaches (like RAID 5 or RAID 
6) may be either insufficient or excessive. An efficient solution to this problem is given 
by the new codes presented, called partial-MDS (PMDS) codes. 

Keywords: Error-correcting codes, flash storage devices, solid state drives, RAID 
architectures, hard errors, MDS codes, array codes, Reed-Solomon codes, Blaum-Roth 
codes. 



1 Introduction 

Consider an array of, say, n storage devices. Each storage device contains a (large) number 
of sectors, each sector protected by an error-correcting code (ECC) dealing with the most 
common errors in the media. However, it may occur that one or more of the storage devices 
experiences a catastrophic failure. In that case, data loss will occur if no further protection is 
implemented. For that reason, the architecture known as Redundant Arrays of Inexpensive 
Disks (RAID) was proposed [15] . 

The way RAID architectures work is by assigning one or more devices to parity. For 
instance, take n sectors in the same location in each device (we call this set of n sectors, a 
"stripe"): n — l sectors carry information, while the nth is the XOR of the n — 1 information 
sectors. We repeat this for each stripe of sectors in the array. Such an architecture is called 
a RAID 4 or a RAID 5 type of architecture. In what follows, we will call it RAID 5, the 
difference between RAID 5 and RAID 4 consisting on the distribution of the parity sectors, 
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but we do not address this issue here. A RAID 6 architecture gives protection against two 
catastrophic failures. 

From a coding point of view, the model of failures corresponds to erasures, i.e., errors 
whose location is known |29]|30j. It is preferrable to use Maximum Distance Separable 
(MDS) codes for RAID 6 types of architectures: in order to correct two erasures exactly two 
parities are needed. There are many choices for MDS codes correcting two or more erasures: 
we can use Reed-Solomon (RS) codes [31j[35j, or array codes, like EVENODD [I], RDP [TT] . 
X-codes [13], B-codes [12], C-codes (28], Liberation codes [36], and others [5J[39]. 

Architectures like RAID 5 and RAID 6 are efficient when the storage devices are hard 
disk drives (HDDs). However, when using solid state drives (SSDs) like flash, these types 
of architectures, by themselves, either are not efficient or they are wasteful. Arrays of SSDs 
pose new challenges for code design, so we will spend the rest of this section addressing some 
of them. Different ways to adapt RAID architectures to SSDs are being considered in recent 
literature. For instance, ways to enhance the performance of RAID 5 are described in [2]|21j. 
See also [16j[20j, where the internal ECC and a RAID type of architecture communicate. In 
particular, [IB] uses an adaptive method to increase the redundancy when the bit-error rate 
increases. 

Contrary to HDDs, SSDs degrade significantly in time and as a function of the number 
of writes [33]. As time goes by and the number of writes increases, the likelihood of a 
hard error in a sector also increases. A hard error occurs mainly when the internal ECC 
of a sector is exceeded. In general, BCH codes [29] [30] are used for the internal ECC of a 
sector, although many other codes (including LDPC codes with soft decoding) are possible. 
Moreover, in recent years some remarkable non-traditional approaches for the ECC that 
exploit the assymetry of the SSD channel have been developed [3][T9l[2^ [23lpl] [25][26][32]. 
However, we do not address the internal ECC problem in this paper. The point is, a hard 
error corresponds to an uncorrectable error in a sector. Normally, the ECC is coupled with a 
Cyclic Redundancy Code (CRC), which detects the situation when the ECC miscorrects (the 
ECC has an inherent detection capability that may not be sufficient, hence it often needs to 
be reinforced by the CRC). There are several ways to implement the CRC, but we do not 
address them here. We will assume instead that a hard error means that the information in 
a sector is lost (an erased sector) and that we can detect this situation. 

From the discussion above, we see that, contrary to arrays of HDDs, arrays of SSDs present 
a mixed failure mode: on one hand we have catastrophic SSDs failures, as in the case of 
HDDs. On the other hand, we also have hard errors, which in general are silent: their 
existence is unknown until the sectors are accessed. This situation complicates the task of 
a RAID type of architecture. In effect, assume that a catastrophic SSD failure occurs in 
a RAID 5 architecture. Each sector of the failed device is reconstructed by XORing the 
corresponding sectors in each stripe of the surviving devices. However, if there is a stripe 
that in addition has suffered a hard error, such a stripe has two sectors that have failed. 
Since we are using RAID 5, we cannot recover from such an event and data loss will occur. 

A possible solution to the situation above is using a RAID 6 type of architecture, in which 
two SSDs are used for parity. Certainly, this architecture allows for the recovery of two 
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erased sectors in the same stripe. However, such a solution is expensive, since it requires 
an additional whole device to protect against hard errors. Moreover, two hard errors in a 
stripe, in addition to the catastrophic device failure, would still cause data loss, and such 
a scenario may not be unlikely, depending on the statistics of errors. We would like some 
solution intermediate between RAID 5 and RAID 6 allowing the handling of hard errors 
without the need of dedicating a whole second SSD to parity, and in addition being able to 
handle at least two hard errors in the same stripe, a catastrophic failure having occurred. 

In order to handle this mixed environment of hard errors with catastrophic failures, we 
need to take into account the way information is written in SSDs, which is quite different 
to the way it is done in HDDs. In an SSD, a new write consists of erasing first a number 
of consecutive sectors and then rewriting all of them. Therefore, the short write operation 
in arrays of SSDs (like one sector at a time) is not an issue here: each time a new write is 
done, a group of, say, m sectors in each SSD is erased and then rewritten. So, the parity 
needs to be recomputed as part of the new write. We can assume that the array consists 
ofmxn blocks (i.e., each block consists of m stripes), repeated one after the other. Each 
m x n block is an independent unit, and we will show how to compute the parity for each 
block. Also, each new write consists of writing a number ofmxn blocks (this number may 
be one, depending on the application, the particular SSD used, and other factors). Our goal 
is to present a family of codes, that we call partial-MDS (PMDS) codes, allowing for the 
simultaneous correction of catastrophic failures and hard errors. 

The paper is organized as follows: in Section El we present the theoretical framework as 
well as the basic definitions. In Section [3j we present our main construction. In Section HI 
we study the special case in which the general construction of Section [3] extends RAID 5, 
and we find the general conditions for such codes to be PMDS. In Section [5j we study 
specific cases with parameters relevant to applications, each case analyzed in a separate 
subsection. In Section El we present an alternative construction to the one presented in 
Section [21 we compare the two and we study some relevant special cases of this second 
construction. In Section [7| we present a third construction for cases extending RAID 5. This 
third construction is not as powerful as the the previous ones (it cannot handle three erasures 
in the same stripe) but uses finite fields of smaller size, simplifying the implementation. In 
Section El we compute the probability of data loss when a catastrophic device failure has 
occurred under different scenarios. We conclude the paper by drawing some conclusions. 

Although the results can be extended to finite fields of arbitrary characteristic, for sim- 
plicity, we consider only fields of characteristic 2. 

2 Partial-MDS codes 

Consider an m x n array, each entry of the array consisting of b symbols (we assume that 
each of the b symbols is a bit for the sake of the description, but in practice it may be a 
much larger symbol). Each stripe in the array is protected by r parity entries in such a way 
that any r erasures in the stripe will be recovered. In other words, each stripe of the array 
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Figure 1: A 4 x 5 array with r = 1 and s = 2 
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Figure 2: 4 x 5 arrays with a catastrophic failure and two hard errors 



constitutes and [n,n — r,r + 1] MDS code. In addition, we will add s extra "global" parities. 
Those s extra parities may be placed in different ways in the array, but in order to simplify 
the description we will place them in the last stripe. Being global means that these parities 
affect all mn entries in the array. For instance, Figure [1] shows a 4 x 5 array with r = 1 and 
s = 2 such that the two extra global parities are placed in the last stripe. 

The idea of a partial-MDS code (to be defined formally), is the following: looking at 
Figure [U assume that a catastrophic failure occurs (that is, a whole column in the array has 
failed), and in addition, we have up to two hard errors anywhere in the array. Then we want 
the code to correct these failures (erasures in coding parlance). The situation is illustrated 
in Figure El where the hard errors are indicated with the letter 'H': the two hard errors may 
occur either in different stripes or in the same stripe. 

A natural way of solving this problem is by using an MDS code. In our 4x5 array example, 
we have a total of 6 parity sectors. So, it is feasible to implement an MDS code on 20 symbols 
with 6 parity symbols. In other words, a [20, 14, 7] MDS code (like a RS code). The problem 
with this approach is its complexity. The case of a 4 x 5 array is given for the purpose of 
illustration, but more typical values of m in applications are m — 16 and even m = 32. That 
would give 18 or 34 parity sectors. Implementing such a code, although feasible, is complex. 
We want that the code, in normal operation, utilizes its underlying RAID structure based 
on stripes, like single parity in the case of RAID 5. The extra parities are invoked in rare 
occasions. So, given this constraint of an horizontal code, we want to establish an optimality 
criterium for codes, that we will call partial-MDS (PMDS) codes. In the case of the example 
of RAID 5 plus two global parities, we want the code to correct up to one erasure per stripe, 
and in addition, two extra erasures anywhere. For example, the code of Figure [1] is PMDS 
if it can correct any of the situations depicted in Figure [2j Formally, 

Definition 2.1 Let C be a linear [mn,m(n — r) — s] code over a field or ring such that 
when codewords are taken row-wise as m x n arrays, each row belongs in an [n,n — r,r + 1] 
MDS code. Given (si, S2, ■ ■ ■ , s t ) such that each Sj > 1 and Sj=i Sj = s, we say that C is 
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(r; si, S2, ■ ■ ■ , s 4 )-erasure correcting if, for any < %\ < i?, < . . . < i t < m — 1, C can correct 
up to Sj + r erasures in each row ij of an array in C. We say that C is an (r; s) partial-MDS 
(PMDS) code if, for every (s x , s 2 , . . . , s t ) such that each Sj > 1 and Z)*-=i s j — s > C is an 
(r; Si, s 2 , • • • , Sf)-erasure correcting code. 

In the next section we give a general construction of codes by providing their (mr + s) x mn 
parity-check matrices. Some of these codes are going to be PMDS. In particular, we will 
analize the case r = 1 in Section H] due to its important practical value, since it extends 
RAID 5. 



3 Code Construction 

As stated in Section [21 our entries consist of b bits. We will assume that each entry is in a 
ring. The ring is defined by a polynomial f(x) of degree b, i.e., the product of two elements in 
the ring (taken as polynomials of degree up to b — 1), is the remainder of dividing the product 
of both elements by f(x) (if f(x) is irreducible, the ring becomes the field GF{2 b ) [3l]). 

Let a be a root of the polynomial f(x) defining the ring. We call the exponent of f(x), 
denoted e(f(x)), the exponent of a, i.e., the minimum £, < £, such that a 1 = 1. If f(x) is 
primitive [§T], e{f(x)) = 2 b - 1. 

A special case that will be important in applications is f{x) = M p (x) = 1 + x + • • • + x p ~ l , 
p a prime number. In this case, e(M p (x)) =p and f(x) may not be irreducible. In fact, it 
is not difficult to prove that f(x) is irreducible if and only if 2 is primitive in GF(p). So, 
the polynomials of degree up to p — 2 modulo M p (x) constitute a ring and not generally a 
field. This ring was used in [7] to construct the Blaum-Roth (BR) codes, and for the rest of 
the paper, we either assume that f(x) is irreducible or that f(x) = M p (x), and p will always 
denote a prime number. 

We present next a general construction, and then we illustrate it with some examples. 

Construction 3.1 Consider the binary polynomials modulo f(x), where either f(x) is irre- 
ducible or f(x) —M p (x), and let mn < e(f(x)), where e(f(x)) is the exponent of f(x). Let 
C(m, n, r, s; f{x)) be the code whose (mr + s) x mn parity-check matrix is 



H(m, n, r, s) 



I H{n,r, 0,0) 


0(n, r) 




0(n,r) \ 


0(n, r) 


H(n, r, 0, r) 




0(n, r) 


0(n, r) 


0(n, r) 




H(n, r, 0, (to — l)r) 



Hymn, s 



r,0) 



(1) 



where, if f(a) =0, H(n,r,i, j) is the r x n matrix 
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Let us point out that matrices H(n, r, i, j) as given by ([2]), in which each row is the square 
of the previous one, were used in [12] [33] [31] f° r constructing codes for which the metric is 
given by the rank, in [6] for constructing codes that can be encoded on columns and decoded 
on rows, and in [27J for constructing the so called differential MDS codes. 

Let us illustrate Construction 13.11 in the next example. 



Example 3.1 Consider m = 3 and n = 5, then, 
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7^(3,5,2,2) = 
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So far we have not proved that Construction 13. 11 provides PMDS codes. Actually, this is 
not true in general. The answer depends on the particular parameters and on the polynomial 
f(x) defining the ring or field. 

We denote by (ai 3 -)o<i<m-i the received entries from a stored array in C(m,n,r, s; f(x)), 

Q<j<n—1 

assuming that the erased ones are equal to 0. The first step to retrieve the erased entries 
consists of computing the rm+s syndromes. Using the parity-check matrix "H(m, n, r, s; f(x)) 
given by (ED), the syndromes are 
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n-1 

Sir = ® a>i,j for < i < m — 1 

j=0 

S ir+l+1 = a^'^'tiij for < i < m - 1 , < / < r - 2 

j=0 

m— 1 n— 1 

Smr+t, = 00 a^C^"" 1 ^ for < U < S - 1 
i=0 j=0 



(3) 
(4) 
(5) 



After computing the syndromes, the erasures are recovered by solving a linear system 
based on the parity-check matrix, provided that such a solution exists. In the next section, 
we study the case r = 1 and give necessary and sufficient conditions that determine whether 
a C(m, n, 1, s; f(x)) code is PMDS. 



4 The case r = 1 

In this section, we assume that r = 1, thus, the parity-check matrix T-L(m, n, 1, s) given by ([T|) 
can be written as 

H(m,n,l,s) = (H (m,n, 1, s), Hi(m, n, l,s),...,^ m _i(m,n, M)) , 
where, for < j < m — 1 
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a 40'n+l) 


a 4((j+l)n-l) 




a V-'{jn+l) _ 


a 2 s - 1 ((j+l)n-l) 



Assume that X^ = i s j = s fo r integers Sj > 1. According to Definition 12. 1[ we will char- 
acterize when C(m, n, 1, s; /(x)) is (r; si, S2, . . . , s^-erasure correcting. We need a series of 
lemmas first. 
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Lemma 4.1 For s > 1, code C(m, n, 1, s; f(x)) as given by Construction 13 . 1 1 is PMDS if and 
only if: 

1. code C(m, n,l,s — 1; /(x)) is PMDS, and; 

2. for any (si, s 2 , ■ ■ ■ , s t ) such that Y?j=\ s j = s ; f° r an y < ^2 < ^3 < • • • < h < m — 1, 
and for any 1 < j < t and < Ijfi < lj,i < ■ ■ ■ < lj jS < n — 1, 



;cd ( (j£ (l + x' 1 -"-' 1 ' )^ + f^:c^+^°-'i>° £ (l + x^-hfi^j j ; j = 1 ( 6 ) 



Proof: Since f(x) is implicit, let us denote C(m, n, 1, s; f(x)) simply by C(m, n, l,s). 
Consider rows 0<ii <22 <---<it<?rz — 1 such that row ij has exactly Sj + 1 erasures 
in locations (ij,lj >0 ), (ij,lj t i), . . . , (ij,lj, s ), for < lj t o < < ■•■ < lj >s < n — 1 and 
Yfj=i s j — s - Assuming the erased entries to be equal to zero and computing the syndromes 
according to © and (El), we obtain 



a ijtljtV = St, for 1 < j < t (7) 
a 2U ^ n+l ^a^ l] v = S m+U for < u < s - 1 (8) 

j=l v =0 

The system given by ([7]) and ([8]) has a unique solution if and only if the {t + s) x (t + s) 
matrix 



(a 



£2 • • • £t 



is invertible, where c,- is the (t + s) x (s 3 - + 1) matrix 
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By row operations on c, we obtain a new (t + s) x (t + s) matrix 



(d 1 \d 2 \...\d t ) 



where c'- is the (t + s) x (sj + 1) matrix 
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Notice that c is invertible if and only if d is invertible, if and only if the s x s matrix 
c" = ( d[ ] ] . . . I c" ) is invertible, where 
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/ q^+'j.O (l 0^,1-^,0 



r>'."-' " (l ' ' ) \ 

0,2(^71+^,0) M 0o 2 fe' s 3 - ^,o) 



Dividing each row u, < u < s, by « 2U ( lin +' 1 .o) ; we obtain the s x s matrix 



where 



c = Ci c 2 



St 



/ IfflcA 1- ' 1 ' 
1 © a *('i.i-'i,o) 
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Therefore, the s x s matrix c consists of a first row u> followed by succesive rows w u , 
where each row is the square of the previous row, i.e., w_ u = w^ for 1 < u < s — 1. Matrix c 
is invertible if and only if its determinant is invertible. The determinant of a matrix of this 
type is known [6]: it is the product of the XOR of all possible subsets of elements of the first 
row. For example, if we have a matrix 



fix 
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7l 


7l 






li 
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then its determinant is 7i7 2 73 (7i © I2) {li © Is) (12 © 73) (7i © 72 © 72)- This result is proven 
similarly to the one of Vandermonde matrices. For the sake of completeness, we prove it in 
the Appendix. 
The first row of matrix c is given by 

where 

w 0A = (l©^ 1 - 1 ^ 1 ' ,!©^ 1 ' 2 -' 1 - ,...,!©^ 1 ^^ 1 - ) 



and, for 2 < j < t, 

Wjo d = (a^ n+ ^ - 11 ' (l © n' , . . . , a ^ n+l ^h,o A a h, S} ' - j j . (9) 

Then, code C(m, n, l,s) is PMDS if and only if the determinant det (c) is invertible, if 
and only if the XOR of any subset of the elements of w; is invertible. Since W7 has s 
elements, we may assume that if we XOR a number elements smaller than s, the result is 
true by induction, so assume that we take the XOR of all the s elements in w . Then, code 
C(m, n, 1, s) is PMDS if and only if code C(m, n, 1, s — 1) is PMDS and 

(© (l©a^°)) © 0^ n+ ^^0(lffi^»^) (10) 

\«=1 / \jf=2 «=1 / 

is invertible. But (flOj) is invertible if and only if fl6]) holds. □ 



Lemma 4.2 Consider a code C(m, n, 1, s; f{x)) and let Sj > 1 for 1 < j < t such that 
X^ = i s j = s - For eacn s j; ^ s i i s let s'j = Sj, while if Sj is even, let s'^ = Sj — 1 and 
s' = J2j=i s 'j- Then, C(m,n,l, s; f(x)) is (1; s%, s 2 , . . . , s 4 )-erasure correcting if and only if 
C(m, n, 1, s'] f(x)) is (1; s[, s' 2 , . . . , sQ-erasure correcting. 



Proof: Consider ([TO]) . If Sj is odd, 



(l©a' J '"~^°) = l©0a^"-^°, (11) 



u=l u=l 



while if Sj is even, 
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u=l u=l 



(l n' ' ') = 

' " ' u=l 

n'-' fie n' ^ (12) 



M=2 
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If si is even, making s[ — s± — 1, according to ( |T2l and ( TTTT) . ( |T0|) becomes 
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(13) 



Since a' 1 - 1- ' 1 - is always invertible, then, by f TTOT) and (fl3l) . if Si is even, C(m, n, l,s — 1) 
is (1; s[, S2, ■ ■ ■ i St)-erasure correcting if and only if C(m, n, 1, s) is (1; s\, s 2 , ■ ■ ■ , s^-erasure 
correcting. Similarly, if 2 < v < t and s v is even, according to f lT2|) and ffTTj) . ffTO]) becomes 
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i a: 
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( t 

n'<"-'<- 

l 3 = 2 



o-/i,o 



u=l 



(14) 



By (fit)]) and (JHJ), we may claim that if s v is even for some 2 < v < t, then, making 
s(, = s v — 1, C(m, n, 1, s — 1) is (1; si, s 2 , . . . , s„_i, s^,, s„ + i . . . , s f )-erasure correcting if and 
only if C(m, n, 1, s) is (sx + 1, s 2 + 1, . . . , s v + 1, . . . , s t + l)-erasure correcting, completing the 
proof. □ 



Lemma 4.3 Consider a code C(m,n,l, s; f(x)) and let Sj > 1, each Sj an odd number 
for 1 < j < t such that X^=i s j = s - Then, C(m, n, 1, s; f(x)) is (1; si, s 2 , . . . , s t )-erasure 
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correcting if and only if, for any < i 2 < i-s < . . . < it < rn — 1 and for any < lj t o < < 
. . . < lj iS] < n — 1 for each 1 < j < t, 

gcd 1 1 + jh x ll ^- h '° + Y,x i i n+l ^- ll >° U + J2 x l ^~ l >A , /(*)] = 1 (15) 

\ u=l j=2 \ u=l J J 

Proof: Notice that in this case, becomes f lT5|) . □ 

The combination of Lemmas 14.11 14.21 and 14.31 gives the following theorem: 

Theorem 4.1 For s > 1, code C(m, n, 1, s; f(x)) as given by Construction 13.11 is PMDS if 
and only if: 

1. code C(m, n,l,s — 1; /(x)) is PMDS, and; 

2. for every (si, S2, . . . , s t ) such that Y?j=i s j = s an d each Sj is odd, for any < i 2 < 
is < . . . < i t < m — 1, and for any 1 < j < t and < Z 3j o < < ■ • • < < n — 1, 
condition f|T5|) holds. 

Theorem 14.11 gives us conditions to check in order to determine if a code C(m, n, 1, s; /(#)) 
as given by Construction 13. ll is PMDS, but by itself it does not provide us with any family of 
PMDS codes. Consider the ring of polynomials modulo M p (x), such that mn < e(M p (x)) —p. 
There are cases in which M p (x) is irreducible )7j and the ring becomes a field (equivalently, 
2 is primitive in GF(p)). Notice that the polynomials in Theorem 14. II have degree at most 
mn — 1 < p — 1 = deg(/(x)). Therefore, if M p {x) is irreducible then all such polynomials 
are relatively prime with M p (x) and the code is PMDS. Let us state this fact as a theorem, 
which provides a family of PMDS codes (it is not known whether the number of irreducible 
polynomials M p (x) is infinite): 

Theorem 4.2 Consider the code C(m, n, 1, s; M p (x)) given by Construction 13. 11 such that 
M p (x) is irreducible (or equivalently, 2 is primitive in GF{p)). Then, C(m,n, 1, s; M p (x)) is 
PMDS. 

So far we have dealt with general values of s. In the next section we examine special cases 
that are important in applications. 

5 Special cases 

We examine each case into a separate subsection. 
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5.1 The case C(m,n, 1, 1; f(x)) 

Notice that C(m,n, 1, 1; f(x)) is always PMDS, since by Theorem I4.1[ we have to check if 
the binomials of type 1 + x 3 1 for 1 < j < n — 1 and f(x) are relatively prime. This is certainly 
the case when f(x) is irreducible, but also when M p (x) is reducible [7]. Let us state this 
result as a lemma: 

Lemma 5.1 Code C(m, n, 1, 1; f(x)) is always PMDS. 

5.2 The case C(m, n, 1, 2; /(») 

This case is important in applications, in particular, for arrays of SSDs. Since 

C(m, n, 1, 1; f(x)) is PMDS, Theorem 14.11 gives the following theorem for the case s = 2: 

Theorem 5.1 Code C(m, n, 1, 2; f(x)) is PMDS if and only if, for any 1 < i < m — 1, and 

for any < Zi >0 < l^i < n — 1, < Z 2 ,o < ^2,1 < n — 1 , 



gcd (l + a^-'i.o + ajto+Ja.o-11,0 ^ + x '2,i-'2,o^ ; = x ( 16 ) 

Given the practical importance of this case, let us examine the decoding (of which the 
encoding is a special case) in some detail. 

Consider a PMDS code C(m,n, 1,2; f(x)), i.e., it satisfies the conditions of Theorem 15.11 
Without loss of generality, assume that we either have three erasures in the same row i , 
or two pairs of erasures in different rows io and z'x, where < io < ii < in — L Consider 
first the case in which the three erasures occur in the same row i Q and in entries jo, ji and 
2% of row i , < j < j\ < j 2 - Assuming initially that a ioJO = aj 0J1 = a ioJ - 2 = 0, using 
and (JSJ) ( (J3J) is used only for r > 1), we compute the syndromes Si , S m and S m+ i. Using 
the parity-check matrix "H(m, n, 1, 2) as given by (pQ), we have to solve the linear system 



a *oJo ^ 


3 ^iojl ^ 


3 Q-i j' 2 


— Si 






B a iora+j2 a ioj2 




n, 2 («on+.j'o)„. . c 


3 a 2(ion+ji) a . oji 6 




= 'S'm+l 



The solution to this system is 



*0 JO 



/ ^0 1 1 

det S m a ion+J1 a ion+j2 

i 1 i 

( } H | a io«+io a io«+il a ion+ja 

Q,2(ion+io) Q,2(jon+ii) a 2(i n+j 2 ) 
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a io ,31 



A «0 ,32 



( 1 



s. 



det a ion+jo S m a ion+j2 

K a 2(i n+j ) Sm+i a 2(i n+j 2 ) 



det 



/ 1 1 1 

a ion+j a ion+ ^ a ian+: > 2 
2(i n+j ) Q ,2(i n+ji) Q ,2(i n+j 2 ) 



\ a 
( 



1 



1 



Sjr, 



det a ion+jo a ion+jl S m 

^ a 2(i n+j ) a 2(i n+n) g^ 



( 



det 



a 



ion+jo 



a 



ion+ji 



a 



ion+3'2 



a 2(i n+j ) Q ,2(i n+ji) Q ,2(j n+j 2 ) 



Since matrix 



/ 1 



a 



a 



1 

ion+jl 



a 



1 \ 

ion+32 



a 2(i n+j ) a 2(i n+ji) a 2(i n+j 2 ) 



is a Vandermonde matrix, 



1 1 

del [ a ion+jo a ion+jl 

a 2(i n+j ) a 2(i n+ji) 



1 

a i n+j 2 
2(i n+j2) 



a: 



4i n+3j +ii 



l©cr 



Ji-Jo 



(1 



©a 



32-30 



©a 



32-31 



a 



This determinant is easily inverted in a field, while in the ring of elements modulo M p (x), 
the elements l©^ 1- - 70 , l©^ 2- - 70 and l©a J2 ~ J1 can be efficiently inverted (for details, 
see 0). 

The encoding is a special case of the decoding. For instance, assume that we place the 
two global parities in locations (m — 1, n — 3) and (m — 1, n — 2), as depicted in Figure [TJ 
After computing the parities aj jn _i for < % < m — 2 using single parity, we have to 
compute the parities a m _i jn _ 3 , a m _i >n _ 2 an d a m _i jn _i using the method above. In particu- 
lar, the Vandermonde determinant becomes (making iq = m — 1, jo = n — 3, j± = n — 2 and 
j 2 = n- 1) a 4(™+"M5 (l© a ) (l©a 2 ) (I® a) =a^ m+ ^- 15 (l©a 4 ). So, we have to invert 
only for the encoding and some operations may be precalculated, making 

the encoding very efficient. We omit the details. 

We analize next the case of two pairs of erasures in rows Iq and ij, < %q < i\ < m — 1, 
and assume that the erased entries are aj 0J0 and aj 0J1 in row i , < jo < ji < n ~ 1> an d 
ajj^o and a il ^ 1 in row ii, < £ < £i < n — 1. 

Again using the parity-check matrix "H(m, n, 1, 2), we have to solve the linear system of 4 
equations with 4 unknowns 
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a «o,io £ 


a ia,h 






— Si 






a h,lo $ 




= Si x 


'-no Jo c 






B a iin+ ^a Ml 


= S m 




3 Q; 2(ion + i 1 ) a . oA ( 




B a 2 ( Jl ™+^a ilA 





where Si and are given by (j3J) and and SVn+i are given by (jSJ). In order to solve this 
linear system, we need to invert the determinant 



det 



1 


Q,ion+jo 



1 



(y ion+j 1 



Q 




1 

hn+£o 



\ a 



2(jo"+io) a 2(ion+ji) a 2(iw+£o) a 





1 

2(i 1 n+£ 1 ) 



\ 



By row operations, we can easily see that this determinant is equal to the following deter- 
minant times a power of a: 



det 



/ 1 1 



l©a' 

V o leo 2 ^' 1- ^ 



o 
i 








1 



det 



a (ii-io)n+/ -io(i Qa/i-A)) 
a 2((ii-i )n+^o-j'o)n a 2(4-4)) y 

a (ii-io)n+4-io(l0 a 4-^o) 

1 © a; 2 OWo) ^((^-^o^o-io)^ a 2(^i-«o)) 



.71 -Jo 



Notice that this determinant corresponds to a 2 x 2 Vandermonde matrix, and it equals 
1 © a jl ~ jo © a {il - io)n+e °- jo (l © a £l ~ io ) times a {h - io)n+e °- jo ( X Q,ii-io) A e a* 1- * ) . We have 
seen that the latter is easy to invert. Inverting 1 ©a J ' 1_J0 © a( ll-l o) n + £ o - .?o(l a ^-^j ) how- 
ever, is not as neat as inverting binomials 1 © a J when the size b of the symbols is a large 
number. Since 1 + x^~^° + x^ ll ~' lo ^ n+io ~^ (1 + x ei ~ e °) and f(x) are relatively prime by The- 



orem 15.11 we can invert 1 



x 



31-30 



X 



(il-io)n+l -j , 



1 + x 1 °) modulo /(x) using Euclid's 



algorithm. This operation may take some computational time, but it is not done very often. 
When it is invoked, performance has already been degraded due in general to a catastrophic 
failure. The emphasis here is on data recovery and not on performance, since data loss is 
not acceptable. 

Let us now analize some concrete PMDS codes C(m, n, 1, 2; f(x)). Consider first finite 
fields GF(2 b ). In Table [TJ we give the value b, the irreducible polynomial f(x) (in octal 
notation), the exponent e(f(x)), and values m and n for which the code C(m, n, 1, 2; f(x)) is 
PMDS according to Theorem 15.11 We have not checked all possible irreducible polynomials, 
so we are not claiming that the values of m and n are maximal in each case, but it is certainly 
feasible to do so. For extensive tables of irreducible polynomials, see |40j . 



Next, consider the case of codes C(m, n, 1, 2; M p (x)). Theorem 14.21 solves the case in which 
M p (x) is irreducible, so assume that M p (x) is not irreducible, i.e., the ring is not a field. 
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Table 1: Some values of b, f(x), m and n for which codes C(m, n, 1, 2; f(x)) are PMDS 

This ring was considered for the BR codes [7] because it allows for efficient correction of 
erasures for symbols of large size without using look-up tables like in the case of finite fields. 
We need to check all possible cases of Theorem 15.11 for different values of m and n, mn < p. 

The results are tabulated in Table [2J which gives the list of primes between 17 and 257 
for which M p (x) is reducible (hence, 2 is not primitive in GF(p)), together with some values 
of m and n, and a statement indicating whether the code is PMDS or not. For most such 
primes the codes are PMDS. The only exceptions are 31, 73 and 89. The case p = 89 is 
particularly interesting, since for m = 8 and n = 11 as well as for m = n = 9, the codes are not 
PMDS. However, for m = ll and n = 8, the code is PMDS, which illustrates the fact that 
a code being PMDS does not depend only on the polynomial M p (x) chosen, but also on m 
and n. 

5.3 The case C(m, n, 1, 3; f(x)) 

There are two ways to obtain s = 3 as a sum of odd numbers: one is 3 itself, the other is 
1 + 1 + 1. Then, by Theorem 14. 1[ we have 

Theorem 5.2 Code C(m, n, 1, 3; f(x)) is PMDS if and only if code C(m, n, 1, 2; f(x)) is 
PMDS, and, for 1 < l x < l 2 < l 3 < n - 1, 

gcd (l + x h + x h + x h , f(x)) = 1 (17) 
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Table 2: Values of p such that 2 is 
C(m, n, 1, 2; M p (x)), mn < p. 
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primitive in GF(p), and some codes 
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and, for any 1 < i% < i 3 < m — 1, < Z^o < < n — 1, < ^2,0 < ^2,1 < n — 1 and 
< h,o < h,i <n-l, 

gcd (l + x Kl ' hfi + x i2n+ ' 2 ' "' 1 ' (l + x' 2 ' 1 "' 2 ' ) + 

a .i 3 n+Js,o-li,o ^ + x i3,i-«3,o^ ) = ! ( lg ) 

So, in order to check if code C(m, n, 1, 3; /(x)) is PMDS, we start checking if code 
C(m, n, 1, 2; f(x)) is PMDS, like in the cases tabulated in Tables [T] and EJ Then we have to 
check if the conditions ( 1TT1) and ( TT8l) of Theorem 15.21 are satisfied. 

For instance, the codes C(m, n, 1, 2; f(x)) in Table[Hare PMDS, but condition f JT8l) in Theo- 
rem l5.2l is quite restrictive and most of the entries do not correspond to codes C(m, n, 1, 3; /(x)) 
that are PMDS. In Table [21 however, several of the codes C(m, n, 1, 2; M p (x)) that are 
PMDS give codes C(m, n, 1, 3; M p (x)) that are also PMDS. We give the results in Table El 
which shows that for the primes 17, 43, 89, 127, 151, 241 and 257, and also for 89 with 
(m, n) = (11, 8), the codes C(m, n, 1, 3; M p (x)) are not PMDS, although the corresponding 
codes C(m, n, 1, 2; M p (x)) were PMDS. 

5.4 The case C(m, n, 1, 4; /(#)) 

As done in Subsection 15. 3[ we have to start writing s = 4 as all possible sums of odd numbers. 
There are three ways of doing so: 4 = 1 + 3,4 = 3 + 1 and 4 = 1 + 1 + 1 + 1. By Theorem 14.11 
we have: 

Theorem 5.3 Code C(m,n, 1,4; f(x)) as given by Construction 13 .11 is PMDS if and only if 
code C(m,n, 1,3; f{x)) is PMDS, and, for any 1 < i < m — 1, < /^o < Z^i < n — 1 and 
< k,o < hi < k,2 < k,3 <n-l, 

gcd (l + x 11 ' 1 - 11 ' + x in+h -°- h - (l + x' 2 - 1 ^ 2 - + x' 2 < 2 ~' 2 <° + x' 2 ' 3- ' 2 - ) , /(x)) = 1, 

for any 1 < i < m — 1, < li t o < l\ t \ < < h,3 < n — 1 and < /2,o < h,i < n — 1, 

gcd (l + x' 1 ' 1- ' 1 ' + x^ 2 ^ 1 ' + x' 1 ' 3 "' 1 - + x in+ ' 2 ' 0-il '° (l + x' 2 ' 1 ^ 2 - ) , /(x)) = 1, 

and for any 1 < i2 < 23 < 24 < m — 1, < /x,q < h,i < — 1, < /2,o < ^2,1 < n — 1, 
< ^3,0 < ^3,1 < n — 1 and < U$ < l^i < n — 1, 

gcd (l + x' 1 ' 1-11 - + x i2n+l2 -°~ ll '° (l + x' 2 - 1 ^ 2 - ) + 

x i3n+h, -h i0 ^ + ^3,1-/3,0^) + x i4n+h,o-h,o (1 + x «4,i-«4,o^ ) = L 

Consider next a restricted situation for a code C(m,n,l,A;f(x)). In [27J, codes were 
constructed that can recover from an erased column together with a row with up to two 
errors, or two different rows with up to one error each. From our coding point of view, a 
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Table 3: Some codes C(m, n, 1, 3; M p (x)) such that p < 257 and M p (x) is not irreducible 
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C(m, n, 1, 4; f(x)) code that is both (l;4)-erasure correcting and (1;2, 2) -erasure correcting 
will accomplish this (these conditions are actually stronger than those in [27], since they 
do not require an erased column, the erasures can be anywhere in the row). For reasons of 
space, we don't address at this point the decoding algorithm for errors and we concentrate 
on the existence of such a code. 

Notice that, according to Lemma fl~3l a code C(m,n, 1,4; f(x)) is (l;4)-erasure correcting, 
if and only if for any 1 < l x < l 2 < l 3 < n — 1, (JIT]) holds. 

Also by Lemma |4.3[ a code C(m, n, 1, 4; f(x)) is (1;2, 2) -erasure correcting, if and only if 
for any 1 < i < m — 1 and < Zj q < h,i < n — I, < ^ 2 ,o < h,i < n — 1, f fT6|) holds. But 
( TT6l) is exactly the condition for code C(m,n, 1,2) to be PMDS by Theorem 15.11 Thus, we 
have the following lemma: 

Lemma 5.2 Code C(m, n, 1, 4; f(x)) is both (l;4)-erasure correcting and (l;2,2)-erasure cor- 
recting if and only if code C(m, n, 1, 2; /(x)) is PMDS and, for any 1 < l\ < l 2 < I3 < n — 1, 
dTID holds. 

We can verify in Table [2]that for the codes C(m, n, 1, 2; M p (x)) that are PMDS, (fT7|) holds. 
Therefore, by Lemma 15721 the corresponding codes C(m, n, 1, 4; M p (x)) are both (l;4)-erasure 
correcting and (l;2,2)-erasure correcting. 

5.5 The case C(m, n, r, 1; f(x)) 

So far, in this section we have considered cases in which r = 1. If r = s = 1, we have seen in 
Subsection 15.11 that the code is PMDS, so we examine here the case r > 1. Thus, assume 
that row i, < % < m — 1, has r + 1 erasures in locations < jo < j\ < ... < j r < n — 1. 
The following theorem is given without proof (it is proven similarly to the previous cases by 
examining determinants): 

Theorem 5.4 Consider code C(m, n, r, 1; f{x)). If r is even, then C(m, n, r, 1; f(x)) is PMDS 
if and only if C(m, n, r — 1, 1; /(x)) is PMDS, while if r is odd, C(m, n, r, 1; /(x)) is PMDS if 
and only if C(m, n, r — 1, 1; /(x)) is PMDS and, for any 1 < li < l 2 < . . . < l r < n — 1, 



Since C(m, n, 1, 1; /(x)) is PMDS, by Theorem El also C(m, n, 2, 1; /(x)) is PMDS. Ac- 
cording to (|T9l) . C(m, n, 3, 1; f(x)) and C(m, n, 4, 1; f(x)) are PMDS if and only if, for any 
1< U < U < U < n - 1. (frTj) holds. 




1 



(19) 
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5.6 The case C(m, n, 2, 2; f(x)) 



For C(m, n, 2, 2; f(x)) to be PMDS, it has to be both (2;2)-erasure correcting and (2;1,1)- 
erasure correcting. As in the previous subsection, C(m, n, 2, 2; f(x)) will be (2;2)-erasure 
correcting if and only if, for any 1 < 1% < I2 < I3 < n — 1, ( IT71) holds. We have also seen at 
the end of the previous subsection that this is equivalent to saying that code C(m, n, 3, 1; f(x)) 
is PMDS. By examining the conditions under which code C(m, n, 2, 2; f(x)) is (2;l,l)-erasure 
correcting, we have the following theorem (again, without proof): 

Theorem 5.5 Code C(m, n, 2, 2; f(x)) is PMDS if and only if code C(m, n, 3, 1; f(x)) is 
PMDS and, for any 1 <i <m — 1, < ii )0 < h,i < h,2 <n—l and < Z 2 ,o < h,\ < k,2 < 
n — 1, if 

g(x) = 1 + rj.h,i—h,a _|_ ^i>2— '1,0 _|_ x "2(h,i-hfi) _|_ 2.2(^1,2—^1,0) _|_ ii,o)(ii,2— Ji,o) _|_ 

^,2(171+^2,0— hfi) ^ _|_ 2^2, 1— '2,0 _|_ j.h,2—h,0 _|_ ^2^3,1—22,0) _|_ ^2(/ 2 ,2-'2,o) _|_ ^.(22,1— 22,o)('2,2— fe.o)^ 

then gcd(fl'(x),/(z)) = l. 



6 An alternative construction 

In this section we present an alternative to Construction 13.11 

Construction 6.1 Consider the binary polynomials modulo f(x), where either f(x) is irre- 
ducible or f(x)=M p (x), and let mn < e(f(x)). Let C^(m, n, r, s; f(x)) be the code whose 
(mr + s) x mn parity-check matrix is 



/ ffW(n,r, 0,0) 
0(n, r) 



'H*' 1 ^ (m, n, r, s) 



0(n, ', 



0(n, r) 
fTW(n,r,0,r) 

0(n, r) 



0(n,r) \ 
0(n, r) 

IfW(n,r,0, (m - l)r) 



V 



H^(mn, s, r, 0) 
where, if /(a) = 0, H^'(n, r, i, j) is the r x n matrix 
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# (1) (n,r, 



a ■ 

(*+2)j 



V 



a (i+r-l)j 

and 0(n, r) is an r x n zero matrix. 



a 



*(?'+!) 



a (*+l)(j+l) 
a (i+2)(j+l) 



a 



(i+r-l)Q+l) 



a (t+l)(i+2) 
a (i+2)(j+2) 

a (i+r-l)(i+2) 



a i(j+«-l) \ 
a (i+l)0"+n-l) 
a (i+2)C?+n-l) 



a (i+r-l)(j+n-l) j 



(21) 
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Next we illustrate Construction 16.11 with some examples. 



Example 6.1 Consider m = 3 and n = 5, then, 



^ (1) (3,5,1,3) 



?^ (1) (3, 5,3,1) 



/ -1 


1 

1 


1 


1 


1 


U 
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u 


U 


U 


u 


U 


U 


U 
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1 


1 
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u 


i) 
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1 
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1 


1 




l 


a 


2 


a 

CT 


4 

a 4 




f; 
CT 


7 




u 

a 3 




1 1 
a 11 


12 


1 s 


14 

a 14 




l 


9 

or 


i 

a 1 


R 


a 8 




12 


14 

a 14 


16 


is 
a 15 


a u 


22 


24 

a Z4 


2fi 


28 

cr° 




\ 1 


Q: 3 


Q 


g 

Q: 


« 12 


at 


o; 


™ 21 


™ 24 


™ 2 ? 


^30 


OL 


OL 


^39 


™ 42 


/ 


/ i 


1 


1 


1 


1 
































\ 


l 


a 


a 2 


0? 


a 4 


































l 


a 2 


a 4 


a 6 


a 8 

















































1 


1 


1 


1 


1 


































a 5 


a 6 


a 7 


a 8 


Q 

a 3 


































a 10 


a 12 


a 14 


a 16 


a 18 
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1 


1 


1 


1 


































a 10 


a 11 


a 12 


a 13 


a 14 


































a 20 


a 22 


a 24 


a 26 


a 28 




V i 


a 3 


a y 


a 9 


a 1 * 


a 15 


a 18 


a 21 


24 

cr 4 


27 


a 3U 


a 33 


a 3y 


a 39 


a 42 


/ 



^ (1) (3,5,2,2) 



/ 1 


i 


1 


1 


1 
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1 


a 


a 2 


a 3 


a 4 
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1 


1 


1 


1 
































a 5 


a 6 


a 7 


a 8 


a 9 
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1 


1 


1 


1 
































a 10 


a 11 


a 12 


a 13 


a 14 


1 


a 2 


a 4 


a« 


a 8 


a lu 


a 1 * 


a u 


a 16 


a 18 


a 20 


a 22 


a 24 


a 28 


a 28 


V i 


a 3 


a 6 


a 9 


a 12 


a 15 


a 18 


a 21 


a 24 


a 27 


a 30 


a 33 


a 36 


a 39 


a 42 ) 



Notice that C(m, n, 1, 2; /(x)) and C^(m, n, 1, 2; /(#)) coincide. Let us analize in the next 
subsections some special cases. 

6.1 The case C^\m, n, r, 1; f(x)) 

Like in Subsection I5.5[ we have to examine under which conditions code C^\m, n, r, 1; /(x)) 
is (1; r)-erasure correcting. Using the parity-check matrix T-i^ l \m, n, r, 1) as defined by f[2"Uj) . 
C^~'(m, n, r, 1; f(x)) is (r; l)-erasure correcting if and only if, for any < i < m — 1 and for 
any 1 < j < ji < . . . < j r , the Vandermonde determinant 
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det 



/ 1 1 

a 2(in+j ) a 2(m+ji) 

a 3(in+j ) a 3(in+ji) 

a r(in+j ) a r(in+h) 



1 \ 



a 



in+jr 



a 2(in+j r ) 
a 3(in+j r ) 

a r(in+j r ) J 



n 

0<t<«r 



is invertible. Since this is always the case, we have the following theorem: 

Theorem 6.1 Code C (1) (m, n, r, 1; f(x)) is PMDS. 

Comparing Theorems 15.41 and 16.11 we conclude that codes C^(m, n, r, 1; /(a;)) are prefer- 
able to codes C(m, n, r, 1; f(x)) for r > 2, since the former are PMDS without restrictions. 

6.2 The case C (1) (m, n, 1, 3; f(x)) 
We give the following theorem without proof: 

Theorem 6.2 Code C^\m,n, 1,3; /(x)) as given by Construction 16. ll is PMDS if and only 
if, for any < %\ 7^ 12 < m — 1, < Z^o < h,i < h,2 < n — I and < Z2.0 < h,i < n — 1, 
/(a) =0, the following matrix is invertible, 



'1,0 



1 



2(ii,l-/l,o) 1 



il,2— il,0 



(i2-ii)n+/ 2 ,o-ii,o 



2('i 



(l©a 



^2.1 —^2.0 N 



,2-Zi,o) Q,2((i2— «i)n+i2,o— ii,o)Q q,2(22,i—'2,o)^ 



(22) 



1 q,3(/i,i-/i,o) ]_ q,3(Zi,2-Zi,o) a 3((i2-u)n+i 2 ,a-<i,o)/]_ , 



1 a 



3(^2,1—^2,0 



and for any 1 < 22 < 23 < m — 1, < < < n — 1, < ^2,0 < ^2,1 < n — 1 and 
< Z3 q < Z3 1 < n — 1, the following matrix is invertible: 



/ lea* 1 - 1- ' 1 - 



a 



i2n+h,o—h,o 



(1 



a 



'2.1 —'2.0 ^ 



a: 



i3n+Z3,o— Zi,o 



(1« 



'3,1— '3,0 N 



1 © a 2 ('i,i-'i : o) Q,2(i2n+22,o-Zi,o)n Q^.i-k.oh Q,2(i3n+i 3> o-«i,o)n Q ,2(i3,i-« 3i0 )-j 
l \ q,3(Zi,i— ii,o) Q,3(i2n+22,o— Zi,o)^ q,3(Z2,i— fe,o)^ q,3(«3"-+Z3,o-Zi,o) ^ q,3(Z3,i-'3,o)^ 



(23) 



Consider /(x) = M p (a;). We tested all prime numbers p such that 2 is primitive in GF(p) 
up to p = 227 (i.e., /(#) is irreducible), and we found out that the matrices given by (1221) 
and (123]) are invertible in all instances. Thus, we have the following lemma: 

Lemma 6.1 Consider the code C^(m, n, 1, 3; M p (x) given by Construction [6TT1 such that 
M p (x) is irreducible (or equivalently, 2 is primitive in GF(p)). Then, for 19 < p < 227, code 
C (1) (m,n,l,3;M p (x)) is PMDS. 
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Prime 


m 


n 


PMDS? 


17 


4 


4 


NO 


23 
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7 


NO 
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5 


YES 


31 


5 


6 


NO 
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5 


NO 


41 


5 


8 


NO 




6 


6 


YES 




8 


5 


YES 


43 


5 


8 


NO 




6 


7 


NO 


47 


4 


11 


YES 




5 


9 


YES 


71 


7 


10 


YES 




8 


8 


YES 




10 


7 


YES 


73 


6 


12 


NO 




7 


10 


NO 




8 


9 


NO 




9 


8 


NO 


79 


6 


13 


YES 




7 


11 


YES 




8 


9 


YES 


89 


8 


11 


NO 




9 


9 


NO 




11 


8 


NO 


97 


8 


12 


YES 




10 


9 


YES 




12 


8 


YES 


103 


9 


11 


YES 




10 


10 


YES 




11 


9 


YES 



Prime 


m 


n 


PMDS? 


109 


9 


12 


YES 




10 


10 


YES 




12 


9 


YES 


113 


10 


11 


NO 




11 


10 


NO 




12 


9 


NO 


127 


11 


11 


NO 




13 


9 


NO 


137 


11 


12 


YES 




12 


11 


YES 




13 


10 


YES 




15 


9 


YES 




lb 


8 


YEb 


151 


15 


10 


1NU 




16 


9 


NO 


1 C T 

157 


12 


1 O 

13 


YEb 




lo 


1 o 
LZ 






lb 


y 


YHvO 


lb i 


lb 


1U 


YHvO 


1 1 


1 ( 


11 


I Hvb 


1 AO 

193 


16 


12 


YES 


iyy 


lb 


iz 


I HvO 


223 


17 


13 


YES 


229 


16 


14 


YES 




28 


8 


YES 


233 


23 


10 


YES 


239 


26 


9 


YES 


241 


24 


10 


NO 


251 


25 


10 


YES 


257 


16 


16 


NO 




32 


8 


NO 



Table 4: Some codes C^^m,?^, 1,3; M p (x)) such that p < 257 and M p (x) is not irreducible 



25 



We leave as an open problem whether codes C^\m, n, 1, 3; M p (x)) are PMDS when M p (x) 
is irreducible (this result was true for codes C(m, n, 1, 3; M p (x)) by Theorem 14.21) . For values 
of p such that 2 is not primitive in GF(p), some results are tabulated in Table H] for different 
values of m and n. This table is very similar to Table |3j 

Comparing Tables |3] and HI we can see that for values of p, m and n for which 
C (1) (m,n,l,3;M p (x)) is PMDS, also C(m, n, 1, 3; M p (x)) is PMDS. However, for p = 23, 
C(3,7,l,3;M„(x)) is PMDS but C (1) (3, 7, 1, 3; M p (x)) is not, forp = 41, C(5, 8, 1, 3; M p {x)) is 
PMDS but C( 1) (5,8,l,3;M p (x)) is not, and for p = 113, C(3, 7, 10, 11; M p (x)), 
C(3,7,ll,10;M p (x)) and C(3, 7, i2,9;M p (x)) are PMDS but C {1) (3,7, 10, 11; M p (a;)), 
C (1) (3,7,ll,10;M p (x)) and C (1) (3, 7, 12, 9; M p {x)) are not. 



7 A Simplified Construction 

In this section we present a construction that is an alternative to codes C(m, n, 1, s; f(x)) 
for 1 < s < 2. In the case of s = 2, the new construction can correct the situation depicted 
at the left of Figure EJ that is, two pairs of erasures in two different rows. It cannot correct 
the situation at the right of Figure EJ i.e., three erasures in the same row. This is a tradeoff, 
since the new construction, as we will see, uses a smaller finite field or ring. Explicitly: 

Construction 7.1 Consider the binary polynomials modulo f(x), where either f(x) is ir- 
reducible or f(x) = M p (x), and let max{m,n} < e(f(x)), where e(f(x)) is the exponent of 
f(x). Let C^ 2 \m, n, 1, 2; f(x)) be the code whose (m + 2) x mn parity-check matrix is 



/ 1 




U {2) (m,n, 1,2) 







V 1 



Q 
Q 







1 a 

a a 



a 

„m-2 



v m+n— 2 



(m, n, 1, 1; f(x)) is the code whose (m + 1) x mn parity-check matrix is given by the first 
m + 1 rows of "H^^m, n, 1, 2). 

The following example illustrates Construction 17.11 

Example 7.1 Consider codes C (2) (3, 5, 1, 2; M 5 (x)) and C (2) (5, 3, 1, 2; M 5 (x)). Then, since 
a 5 = 1, their respective parity-check matrices are 



•H (2) (3,5,l,2) 
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1 


1 

















































1 


1 


1 

















































1 
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a 2 
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a 2 




V 1 
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a 2 
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a 2 


a 3 


a 2 


a 3 


a 4 


a 3 


a 4 
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a 4 


1 


a 


1 



The following lemma is immediate: 

Lemma 7.1 The code C^ 2 \m,n, 1, 1; f(x)) given by Construction 17. ll is PMDS. 

Comparing lemmas [5 . 1 1 and [7TT| both C(m, n, 1, 1; f{x)) and C^ 2 \m, n, 1, 1; g(x)) are PMDS, 
where f(x) and g(x) are either irreducible or have the form M p (x) for some prime number p. 
However, the conditions on (m, n, 1, 1; g(x)) are less stringent. For instance, if we consider 
the codes of Example 17.11 for M p (x), we can see that we need to consider at least p = 17 for 
C(3,5,l,l;M p (x)) and C(5, 3, 1, 1; M p (x)), while we may take p = 5 for C (2) (3, 5, 1, 1; M p (x)) 
and C (2) (5, 3,1,1; M p (x)). Thus, although we are using a smaller field or ring, the PMDS 
property is not lost. This is not the case for codes C^ 2 \m,n, 1,2; f(x)): we immediately see 
that the codes are not (l;2)-erasure correcting (and hence are not PMDS). However, they 
are (l;l,l)-erasure correcting, as stated in the following lemma: 

Lemma 7.2 The code C^ 2 \m,n, 1,2; f(x)) given by Construction 17. ll is (l;l,l)-erasure cor- 
recting. 

Proof: Assume that we have two erasures in locations jo and ji of row i$ and two erasures 
in locations £q and t\ of row i\, where < io < i\ < m — 1, < jo < ji < n — 1 and < 
(-o < P»i < n — 1. Using the parity-check matrix Tv^im, n, 1, 2) as given in Construction 17. ll 
these four erasures can be recovered if and only if 



det 



/ 1 


1 
















1 


1 




a jo 




a e ° 


a* 




\ a io+jo 


a io+h 


a h+io 




) 



is invertible. By row operations, we find out that this determinant is invertible if and only 
if 1 ©a J1 ~-' , 1 ®<y £l ~ e ° and 1 ©a 11-10 are invertible. But this is certainly the case if f(x) is 
irreducible or f\x) = M p (x), since j\ — j , t\ — £ and i\ — i are smaller than the exponent 
of ' □ 



Lemma [7.21 is important in applications. Let us compare it with C(m, n, 1, 2; f(x)) codes 
that are PMDS as given in Subsection 15.21 For the sake of discussion, let us assume 
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that max{m, n} < 15, a situation that covers some practical applications. In the case 
of C(m, n, 1, 2; f(x)), using Table HJ if n = 15, we would need to operate on the field GF(2 16 ). 
If we use a code C^ 2 \m, n, 1, 2; f(x)), we can take the finite field GF(2 A ) as given by a prim- 
itive polynomial, which has exponent 15. If we use GF(2 5 ) with a primitive polynomial, 
we can increase m to 16, a value convenient in applications. If we use rings generated by 
M p (x) and we want m < 17, n < 17, by Table El we may use p = 257 for a PMDS code 
C(m,n,l,2; M 2 5t(x)). If we just implement a (1; 1,1) -erasure correcting code for the same 
values of m and n, we can do it with a code C^ 2 \m, n, 1, 2; M\j(x)). 

Let us point out that Construction 17. ll is closely related to Generalized Concatenated (GC) 
codes [5] [13]. For descriptions of GC codes, see also [9] [13] [2] and the references therein. 
Implementations of GC codes are given by two-level ECC schemes [lj[10j|34|, later improved 
in the two-level p2] [H] and the multilevel [38] Integrated Interleaving schemes. 

Using ideas similar to the ones of GC codes we can extend Construction 17.11 to codes 

s 

that are (1; 1, 1, . . . , l)-erasure correcting (that we denote C^ 2 \m, n, 1, s; f(x))) as well as 
other combinations by using horizontal and vertical codes, but for reasons of space we omit 
them here. Moreover, as we will see in the next section, there is not much gain for codes 
C^ 2 \m, n, 1, s; f(x)) and s > 3 with respect to codes C^ 2 '(m, n, 1, 2; f(x)) in a mixed envi- 
ronment of catastrophic failures and hard errors. 

8 Probability of Data Loss After One Disk Failure 

In this section, we assume that a catastrophic device failure has occurred. We will make 
a number of assumptions and we will compute the probability of data loss for the different 
schemes presented in the paper as a function of the raw error probability p (a parameter 
that, as we have discussed in the Introduction, degrades with time and with the number 
of writes for SSDs). Specifically, we will compare (1; 2) PMDS codes and (l;l,l)-erasure 
correcting codes, since both have the same redundancy (but the former is implemented over 
a larger field). We assume that the information in each SSD is stored in pages, where each 
page has size 4K and there are eight 512B sectors per page. Further, we assume that each 
sector is protected by a t-bit error-correcting code, like a BCH code (for instance, t = 15) . 
Each SSD device has M pages. For example, if a device has size 32 G, it has 8 million pages. 
We assume that stripes are rows of pages in an m x n block. 

Since we assume that one of the n devices has failed, if exactly one hard error has occurred 
in at least three different stripes of an m-stripe block, we will have data loss. If a (1;2) 
PMDS code is used, three hard errors in the same stripe will also cause data loss, while if 
a (1; 1, l)-erasure correcting code is used, two hard errors in the same stripe are enough to 
cause data loss. 

As stated above, each codeword is in a BCH code with 512B information bytes (4096 bits). 
The BCH code can correct up to t bit errors, so the redundancy is 13t bits, giving 195 bits 
for £ = 15. 
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We want to compute first the probability P that a codeword cannot be decoded. This 
will occur each time t + 1 or more errors occur, and this event we assume is always detected 
either by the BCH code itself or by the CRC. Therefore, we have: 



4096+13* 

\4096+13t-i 



)96+13i / 409 g + 13 A 



pY 



Thus, since there are 8 sectors per page, the probability that in a page at least a codeword 
is not corrected (i.e., a hard error) is 



' * = 1 - (1 -P) 8 . 

The probability of exactly one hard error in a stripe is 



n-2 



P HR=1 = {n - 1)P H (1 - Pa) 



The probability of more than j hard errors in a stripe is 

Phr>j = E (" T (PuY (i - Ph)— 1 

The probability of exactly one hard error in at least three of the m stripes in a block is 
then 



Psm.1,3 = E(7j( P HR=ir (1-PH) ( - I)(m - J) 

The probability of at least j + 1 hard errors in any of the m stripes of the block is 

mP HR>j 

The probability of data loss in an m-stripe block of a (l;l,l)-erasure correcting code is 
then given by 

-Ps(l;l,l)EC = Psm,l,3 + Psm,2,l, 

while the probability of data loss in an m-stripe block of a (1,2) PMDS code is given by 

p3(l;2)PMDS = p3m,l,3 + Psm,3,l- 
29 



We can now compute the probability of data loss for both a (l;l,l)-erasure correcting code 
and a (1;2) PMDS code. That will occur each time at least one m-stripe block has experienced 
data loss. Thus, since we had assumed that there are M pages per device and that each 
block has m stripes, there are M/m blocks (for 32G SSDs, and m — 16, M/m = 500, 000), 
we obtain for a (l;l,l)-erasure correcting code, 



Looking at the probabilities of data loss in an m-stripe block for m = 16 and 32G devices 
in Table [5j we can see that in general Ps(i;i,i)EC is dominated by Psm,2,i (i- e -> -Ps (i;i,i)EC ~ 
Psm,2,i), while P S (1;2)pmds is dominated by P Sm ,i,3 (i.e., Ps(i ; 2)pmds » ^Sm.i.s)- For that 



reason, by increasing s, the probability of data loss of a (1; 1, 1, ... , l)-erasure correcting 
code is basically the same for any s > 2 when a whole device has failed. In particular, for 
s = m, we have RAID 6. Of course RAID 6 can tolerate a second device failure, but any 
hard error in the case of two device failures will cause data loss. 

Another conclusion from Table [5] is the advantage of using a (1;2) PMDS code over a 
(l;l,l)-erasure correcting code, both codes having the same number of parity entries. As 
stated in the Introduction, as the system ages, the bit error probability p degrades. So, a 
natural question is, if we are monitoring p, which value allows us a reasonable expectation of 
not experiencing data loss? For instance, when we reach p— .0007, according to Tabled the 
probability of miscorrection in case a device fails is 7.8E-5. This may be viewed as, less than 
one in ten thousand systems will have data loss provided a device has failed, which may be 
acceptable (depending on the application). However, if we used a (1;2) PMDS code, when 
p = .0008, the probability of data loss is 6.3E-6, more than an order of magnitude better 
than the (l;l,l)-erasure correcting code. So, the system is more reliable and allows further 
degradation of the parameter p, increasing its lifetime. 

9 Conclusions 

We have presented two constructions of codes that are suitable for a flash array type of archi- 
tecture, in which hard errors co-exist with catastrophic device failures. We have presented 
specific codes that are useful in applications. Necessary and sufficient conditions for codes 
satisfying an optimality criterion were given. 




and for a (1;2) PMDS code, 



(1;2)PMDS 




s 



30 



V 


.0001 


.0002 


.0003 


.0004 


.0005 


.0006 


.0007 


.0008 


.0009 


.001 


p 


4.1E-20 


1.8E-15 


7.9E-13 


5.3E-11 


1.3E-9 


1.6E-8 


1.2E-7 


7.0E-7 


3.1E-6 


1.1E-5 




3.3E-19 


1.4E-14 


6.3E-12 


7.9E-13 


1.0E-8 


1.3E-7 


9.9E-7 


5.6E-6 


2.5E-5 


9.0E-5 


PS 16,1,3 


2.5E-51 


2.1E-37 


1.8E-29 


5.3E-24 


7.2E-20 


1.4E-16 


6.8E-14 


1.3E-11 


1.1E-9 


5.2E-8 


Ps 16,2,1 


5.3E-35 


3.3E-26 


6.4E-21 


2.9E-17 


1.6E-14 


2.5E-12 


1.6E-10 


5.1E-9 


1.0E-9 


1.3E-6 


PS 16,3,1 


5.7E-54 


4.8E-40 


4.1E-32 


1.2E-26 


3.1E-19 


3.1E-19 


1.6E-16 


2.9E-14 


2.5E-12 


1.2E-10 


•Ps (1;1,1)EC 


1.7E-35 


3.3E-26 


6.4E-21 


2.9E-17 


1.6E-14 


2.5E-12 


1.6E-10 


5.1E-9 


1.0E-7 


1.4E-6 


Ps (1;2)PMDS 


2.5E-51 


2.1E-37 


1.8E-29 


5.3E-24 


7.2E-20 


1.4E-16 


6.8E-14 


1.3E-11 


1.1E-9 


5.2E-8 


-PdL (1;1,1)EC 


8.6E-30 


1.7E-20 


3.2E-15 


1.4E-11 


8.2E-9 


1.3E-6 


7.8E-5 


2.5E-3 


.05 


.5 


-PdL (1;2)PMDS 


1.2E-45 


1.1E-31 


8.9E-24 


2.7E-18 


3.6E-14 


6.9E-11 


3.4E-8 


6.3E-6 


5.4E-4 


.026 



Table 5: Probabilities of data loss for (l;l,l)-erasure correcting codes and (1;2) PMDS codes 
for different values of bit error probability p in the presence of a catastrophic device failure 



A Appendix 



Lemma A.l Let 7 ,7i, • • • ,l s -i be distinct elements in a field or ring of characteristic 2. 
Consider the s x s matrix 



Then, 





f 7o 


7i • 


•• 7 s -i \ 




ll 


ll ■ 


•• iU 


r = 


it 


it ■ 


■■ iU 






os — 1 


OS — 1 






ll 


•• ls-1 J 


detr 




n 


7, 



5c{7 ,7 1 ..,7 3 _ 1 } its 



Proof: We will do induction on s. The result is certainly true for s — 1. Consider the 
determinant of the matrix obtained by replacing 7 in the first column of T by x, i.e., 



h(x) 



det 



x 

2 



X 

x 4 



ll 

ll 
if 



OS — 1 os — 1 

x z 7j 



ls-1 \ 

iU 
iU 



os — 1 

ls-1 ) 



Since h(x) has degree 2 s 1 , it has at most 2 s 1 zeros. Notice that if S is one of the 2 s 1 
subsets of {71, 7 2 • • • , 7 s _i} (including the empty subset), then © ie s 7j is a zero of h(x) (the 
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element corresponding to the empty set), due to the linearity of the square operation in a 
field of characteristic 2. Therefore, we can write, 



h(x) = 



c n 

sc{ 7l ,7 2 ...,7 i 



U + 7*), 



where 



( 7l 



C = det 



72 

7i 
it 



ns-2 qs-2 

V 7i 72 



7.-i \ 
7ti 



os-2 
7f-2 / 



The result follows from the fact that /i(7 ) = det(r) and by induction on the expression of 
C above. □ 
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